Supported Technologies for Connecting Nodes in Data Pipeline Studio
While creating a data pipeline, you connect nodes of various stages to each other. The technologies that are supported in the stages depend on the capability you are using such as data integration, data transformation, data quality or data visualization. You must understand how the different stages of a data pipeline can be connected, depending on which technology you have added to the stage.
The following section lists the capabilities with the technology you choose and the supported combination of source and target nodes in Data Pipeline Studio.
This is what a typical data ingestion pipeline looks like in the Lazsa Platform:
The Lazsa Platform supports the following technologies for data integration stage in a data ingestion pipeline:
The following technologies are supported for source and target stages using Databricks for data integration:
Source Stage | Data Integration Stage | Target Stage |
---|---|---|
CSV | Databricks | Amazon S3 |
Databricks | Snowflake | |
MS Excel | Databricks | Amazon S3 |
Databricks | Snowflake | |
Parquet | Databricks | Amazon S3 |
Databricks | Snowflake | |
FTP | Databricks | Amazon S3 |
Databricks | Snowflake | |
SFTP | Databricks | Amazon S3 |
Databricks | Snowflake | |
REST API | Databricks | Amazon S3 |
Databricks | Snowflake | |
Lazsa Ingestion Catalog | Databricks | Amazon S3 |
Databricks | Snowflake | |
Microsoft SQL Server | Databricks | Amazon S3 |
Databricks | Snowflake | |
MySQL | Databricks | Amazon S3 |
Databricks | Snowflake | |
Oracle | Databricks | Amazon S3 |
Databricks | Snowflake | |
PostgreSQL | Databricks | Amazon S3 |
Databricks | Snowflake | |
Snowflake | Databricks | Amazon S3 |
Databricks | Snowflake | |
Amazon S3 | Databricks | Amazon S3 |
Databricks | Snowflake | |
Kinesis Streaming | Databricks | Amazon S3 |
Databricks | Snowflake |
Source Stage | Data Integration Stage | Target Stage |
---|---|---|
Salesforce | Amazon AppFlow | Amazon S3 |
Amazon AppFlow | Snowflake | |
ServiceNow | Amazon AppFlow | Amazon S3 |
Source Stage | Data Integration Stage | Target Stage |
---|---|---|
Amazon S3 (Data Lake) | Snowflake Bulk Ingest | Snowflake |
Source Stage | Data Integration Stage | Target Stage |
---|---|---|
Amazon S3 (Data Lake) | Snowflake Stream Ingest | Snowflake |
This is what a typical data transformation pipeline looks like in the Lazsa Platform:
The Lazsa Platform provides data transformation using templatized jobs or custom jobs depending on the code that you use. Templatized jobs include join/union/aggregate functions that can be performed to group or combine data. For complex operations to be performed on data, DPS provides the option of creating custom transformation jobs.
The Lazsa Platform supports data transformation using the following technologies:
Data Lake | Data Transformation Stage | Target Stage |
---|---|---|
Snowflake | Databricks (custom transformation) | Snowflake |
Amazon S3 | Databricks (templatized and custom transformation) | Amazon S3 |
Source Stage | Data Transformation Stage | Target Stage |
---|---|---|
Snowflake | Snowflake (templatized and custom transformation) | Snowflake |
This is what a typical data quality pipeline looks like in the Lazsa Platform:
You can perform data quality using Databricks or Snowflake data quality capabilities, on the dataset available in an Amazon S3 data lake or a Snowflake data lake. The technology that you use entirely depends on the organizational preference.
The Lazsa Platform provides various tools for data quality:
Data Quality Stage | Target Stage |
---|---|
Data Profiler | Amazon S3 |
Data Analyzer | Amazon S3 |
Data Issue Resolver | Amazon S3 |
Data Quality Stage | Target Stage |
---|---|
Data Profiler | Snowflake |
Data analyzer | Snowflake |
Data Issue Resolver | Snowflake |
This is what a typical data visualization pipeline looks like in the Lazsa Platform.
The following technologies are supported for data visualization:
This is what a typical data analytics pipeline looks like in the Lazsa Platform.
The Lazsa Platform provides the option of using either predefined algorithms like Random Forest Classifiers, Support Vector Classifier or creating custom algorithms according to your specific requirement.
Source Stage | Data Analytics Stage | Target Stage |
---|---|---|
Snowflake | Python with JupyterLab | Snowflake |
Amazon S3 | Python with JupyterLab | Snowflake |
Snowflake | Python with JupyterLab | Amazon S3 |
Amazon S3 | Python with JupyterLab | Amazon S3 |
What's next? Create a Data Pipeline |